Introducing a Performance Model for Bandwidth-Limited Loop Kernels

نویسندگان

  • Jan Treibig
  • Georg Hager
چکیده

We present a performance model for bandwidth limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-core architectures: Complexities of performance prediction and the impact of cache topology

The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to in-cache situations and modern multi-core architectures yields unsatisfactory results. This paper analyzes the influence of cache hierarchy design on performance predictions for bandwidth-limited loop kernels on current mainstream processors. We present a diagno...

متن کامل

Robust Controller Design Based-on Aerodynamic Load Simulator Identification Driven by PMSM for Hardware-in-the-Loop Simulations

Aerodynamic load simulators generate the required time varying load to test the actuator’s performance in the laboratory. Electric Load Simulator (ELS) as one of variety of the dynamic load simulators should follows the rotation of the Under Test Actuator (UTA) and applies the desired torque to UTA’s rotor at the same time. In such a situation, a very large torque is imposed to the ELS from the...

متن کامل

A-New-Closed-form-Mathematical-Approach-to-Achieve Minimum Phase Noise in Frequency Synthesizers

The aim of this paper is to minimize output phase noise for the pure signal synthesis in the frequency synthesizers. For this purpose, first, an exact mathematical model of phase locked loop (PLL) based frequency synthesizer is described and analyzed. Then, an exact closed-form formula in terms of synthesizer bandwidth and total output phase noise is extracted. Based on this formula, the phase ...

متن کامل

Design of a Single-Layer Circuit Analog Absorber Using Double-Circular-Loop Array via the Equivalent Circuit Model

A broadband Circuit Analogue (CA) absorber using double-circular-loop array is investigated in this paper. A simple equivalent circuit model is presented to accurately analyze this CA absorber. The circuit simulation of the proposed model agrees well with full-wave simulations. Optimization based the equivalent circuit model, is applied to design a single-layer circuit analogue absorber using d...

متن کامل

Wave Equation Based Stencil Optimizations on Multi-core CPU

As the engine for seismic imaging algorithms, stencil kernels modeling wave propagation are both computeand memoryintensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009